home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Language/OS - Multiplatform Resource Library
/
LANGUAGE OS.iso
/
gnu
/
elispman.lha
/
elispman
/
elisp-24
(
.txt
)
< prev
next >
Wrap
GNU Info File
|
1993-06-01
|
48KB
|
839 lines
This is Info file elisp, produced by Makeinfo-1.55 from the input file
elisp.texi.
This is edition 2.0 of the GNU Emacs Lisp Reference Manual, for
Emacs Version 19.
Published by the Free Software Foundation, 675 Massachusetts Avenue,
Cambridge, MA 02139 USA
Copyright (C) 1990, 1991, 1992, 1993 Free Software Foundation, Inc.
Permission is granted to make and distribute verbatim copies of this
manual provided the copyright notice and this permission notice are
preserved on all copies.
Permission is granted to copy and distribute modified versions of
this manual under the conditions for verbatim copying, provided that
the entire resulting derived work is distributed under the terms of a
permission notice identical to this one.
Permission is granted to copy and distribute translations of this
manual into another language, under the above conditions for modified
versions, except that this permission notice may be stated in a
translation approved by the Foundation.
File: elisp, Node: Case Changes, Next: Text Properties, Prev: Columns, Up: Text
Case Changes
============
The case change commands described here work on text in the current
buffer. *Note Character Case::, for case conversion commands that work
on strings and characters. *Note Case Table::, for how to customize
which characters are upper or lower case and how to convert them.
- Command: capitalize-region START END
This function capitalizes all words in the region defined by START
and END. To capitalize means to convert each word's first
character to upper case and convert the rest of each word to lower
case. The function returns `nil'.
If one end of the region is in the middle of a word, the part of
the word within the region is treated as an entire word.
When `capitalize-region' is called interactively, START and END
are point and the mark, with the smallest first.
---------- Buffer: foo ----------
This is the contents of the 5th foo.
---------- Buffer: foo ----------
(capitalize-region 1 44)
=> nil
---------- Buffer: foo ----------
This Is The Contents Of The 5th Foo.
---------- Buffer: foo ----------
- Command: downcase-region START END
This function converts all of the letters in the region defined by
START and END to lower case. The function returns `nil'.
When `downcase-region' is called interactively, START and END are
point and the mark, with the smallest first.
- Command: upcase-region START END
This function converts all of the letters in the region defined by
START and END to upper case. The function returns `nil'.
When `upcase-region' is called interactively, START and END are
point and the mark, with the smallest first.
- Command: capitalize-word COUNT
This function capitalizes COUNT words after point, moving point
over as it does. To capitalize means to convert each word's first
character to upper case and convert the rest of each word to lower
case. If COUNT is negative, the function capitalizes the -COUNT
previous words but does not move point. The value is `nil'.
If point is in the middle of a word, the part of word the before
point (if moving forward) or after point (if operating backward)
is ignored. The rest is treated as an entire word.
When `capitalize-word' is called interactively, COUNT is set to
the numeric prefix argument.
- Command: downcase-word COUNT
This function converts the COUNT words after point to all lower
case, moving point over as it does. If COUNT is negative, it
converts the -COUNT previous words but does not move point. The
value is `nil'.
When `downcase-word' is called interactively, COUNT is set to the
numeric prefix argument.
- Command: upcase-word COUNT
This function converts the COUNT words after point to all upper
case, moving point over as it does. If COUNT is negative, it
converts the -COUNT previous words but does not move point. The
value is `nil'.
When `upcase-word' is called interactively, COUNT is set to the
numeric prefix argument.
File: elisp, Node: Text Properties, Next: Substitution, Prev: Case Changes, Up: Text
Text Properties
===============
Each character position in a buffer or a string can have a "text
property list", much like the property list of a symbol. The properties
belong to a particular character at a particular place, such as, the
letter `T' at the beginning of this sentence or the first `o' in
`foo'--if the same character occurs in two different places, the two
occurrences generally have different properties.
Each property has a name, which is usually a symbol, and an
associated value, which can be any Lisp object--just as for properties
of symbols (*note Property Lists::.).
If a character has a `category' property, we call it the "category"
of the character. It should be a symbol. The properties of the symbol
serve as defaults for the properties of the character.
Copying text between strings and buffers preserves the properties
along with the characters; this includes such diverse functions as
`substring', `insert', and `buffer-substring'.
* Menu:
* Examining Properties:: Looking at the properties of one character.
* Changing Properties:: Setting the properties of a range of text.
* Property Search:: Searching for where a property changes value.
* Special Properties:: Particular properties with special meanings.
* Not Intervals:: Why text properties do not use
Lisp-visible text intervals.
File: elisp, Node: Examining Properties, Next: Changing Properties, Up: Text Properties
Examining Text Properties
-------------------------
The simplest way to examine text properties is to ask for the value
of a particular property of a particular character. For that, use
`get-text-property'. Use `text-properties-at' to get the entire
property list of a character. *Note Property Search::, for functions
to examine the properties of a number of characters at once.
These functions handle both strings and buffers. Keep in mind that
positions in a string start from 0, whereas positions in a buffer start
from 1.
- Function: get-text-property POS PROP &optional OBJECT
This function returns the value of the PROP property of the
character after position POS in OBJECT (a buffer or string). The
argument OBJECT is optional and defaults to the current buffer.
If there is no PROP property strictly speaking, but the character
has a category which is a symbol, then `get-text-property' returns
the PROP property of that symbol.
- Function: text-properties-at POSITION &optional OBJECT
This function returns the list of properties held by the character
at POSITION in the string or buffer OBJECT. If OBJECT is `nil',
it defaults to the current buffer.
File: elisp, Node: Changing Properties, Next: Property Search, Prev: Examining Properties, Up: Text Properties
Changing Text Properties
------------------------
The primitives for changing properties apply to a specified range of
text. The function `set-text-properties' (see end of section) sets the
entire property list of the text in that range; more often, it is
useful to add, change, or delete just certain properties specified by
name.
Since text properties are considered part of the buffer's contents,
and can affect how the buffer looks on the screen, any change in the
text properties is considered a buffer modification. Buffer text
property changes are undoable.
- Function: add-text-properties START END PROPS &optional OBJECT
This function modifies the text properties for the text between
START and END in the string or buffer OBJECT. If OBJECT is `nil',
it defaults to the current buffer.
The argument PROPS specifies which properties to change. It
should have the form of a property list (*note Property Lists::.):
a list whose elements include the property names followed
alternately by the corresponding values.
The return value is `t' if the function actually changed some
property's value; `nil' otherwise (if PROPS is `nil' or its values
agree with those in the text).
For example, here is how to set the `comment' property to `t' for
a range of text:
(add-text-properties (region-beginning)
(region-end)
(list 'comment t))
- Function: put-text-property START END PROP VALUE &optional OBJECT
This function sets the PROP property to VALUE for the text between
START and END in the string or buffer OBJECT. If OBJECT is `nil',
it defaults to the current buffer.
- Function: remove-text-properties START END PROPS &optional OBJECT
This function deletes specified text properties from the text
between START and END in the string or buffer OBJECT. If OBJECT
is `nil', it defaults to the current buffer.
The argument PROPS specifies which properties to delete. It
should have the form of a property list (*note Property Lists::.):
a list whose elements include the property names followed by the
corresponding values. The property names mentioned in PROPS are
the ones deleted from the text. The values associated in PROPS
with these names do not matter.
The return value is `t' if the function actually changed some
property's value; `nil' otherwise (if PROPS is `nil' or if none of
the text had any of those properties).
- Function: set-text-properties START END PROPS &optional OBJECT
This function completely replaces the text property list for the
text between START and END in the string or buffer OBJECT. If
OBJECT is `nil', it defaults to the current buffer.
The argument PROPS is the new property list. It should have the
form of a list whose elements include the property names followed
by the corresponding values.
After `set-text-properties' returns, all the characters in the
specified range have identical properties.
If PROPS is `nil', the effect is to get rid of all properties from
the specified range of text. Here's an example:
(set-text-properties (region-beginning)
(region-end)
nil)
File: elisp, Node: Property Search, Next: Special Properties, Prev: Changing Properties, Up: Text Properties
Property Search Functions
-------------------------
In typical use of text properties, most of the time several or many
consecutive characters have the same value for a property. Rather than
writing your programs to examine characters one by one, it is much
faster to process chunks of text that have the same property value.
Here are functions you can use to do this. In all cases, OBJECT
defaults to the current buffer.
- Function: next-property-change POS &optional OBJECT
The function scans the text forward from position POS in the
string or buffer OBJECT till it finds a change in some text
property, then returns the position of the change. In other
words, it returns the position of the first character beyond POS
whose properties are not identical to those of the character just
after POS.
The value is `nil' if the properties remain unchanged all the way
to the end of OBJECT. If the value is non-`nil', it is a position
greater than POS, never equal.
Here is an example of how to scan the buffer by chunks of text
within which all properties are constant:
(while (not (eobp))
(let ((plist (text-properties-at (point)))
(next-change
(or (next-property-change (point) (current-buffer))
(point-max))))
PROCESS TEXT FROM POINT TO NEXT-CHANGE...
(goto-char next-change)))
- Function: next-single-property-change POS PROP &optional OBJECT
The function scans the text forward from position POS in the
string or buffer OBJECT till it finds a change in the PROP
property, then returns the position of the change. In other
words, it returns the position of the first character beyond POS
whose PROP property differs from that of the character just after
POS.
The value is `nil' if the properties remain unchanged all the way
to the end of OBJECT. If the value is non-`nil', it is a position
greater than POS, never equal.
- Function: previous-property-change POS &optional OBJECT
This is like `next-property-change', but scans back from POS
instead of forward. If the value is non-`nil', it is a position
always strictly less than POS.
- Function: previous-single-property-change POS PROP &optional OBJECT
This is like `next-property-change', but scans back from POS
instead of forward. If the value is non-`nil', it is a position
always strictly less than POS.
File: elisp, Node: Special Properties, Next: Not Intervals, Prev: Property Search, Up: Text Properties
Special Properties
------------------
If a character has a `category' property, we call it the "category"
of the character. It should be a symbol. The properties of the symbol
serve as defaults for the properties of the character.
You can use the property `face' to control the font and color of
text. *Note Faces::, for more information. This feature is temporary;
in the future, we may replace it with other ways of specifying how to
display text.
The property `mouse-face' is used instead of `face' when the mouse
is on or near the character. For this purpose, "near" means that all
text between the character and where the mouse is have the same
`mouse-face' property value.
You can specify a different keymap for a portion of the text by means
of a `local-map' property. The property's value, for the character
after point, replaces the buffer's local map. *Note Active Keymaps::.
If a character has the property `read-only', then modifying that
character is not allowed. Any command that would do so gets an error.
If a character has the property `modification-hooks', then its value
should be a list of functions; modifying that character calls all of
those functions. Each function receives two arguments: the beginning
and end of the part of the buffer being modified. Note that if a
particular modification hook function appears on several characters
being modified by a single primitive, you can't predict how many times
the function will be called.
Insertion of text does not, strictly speaking, change any existing
character, so there is a special rule for insertion. It compares the
`read-only' properties of the two surrounding characters; if they are
non-`nil' and `eq' to each other, then the insertion is not allowed.
Assuming insertion is allowed, it then gets the `modification-hooks'
properties of those characters and calls all the functions in each of
them. (If a function appears on both characters, it may be called once
or twice.)
See also *Note Change Hooks::, for other hooks that are called when
you change text in a buffer.
The special properties `point-entered' and `point-left' record hook
functions that report motion of point. Each time point moves, Emacs
compares these two property values:
* the `point-left' property of the character after the old location,
and
* the `point-entered' property of the character after the new
location.
If these two values differ, each of them is called (if not `nil') with
two arguments: the old value of point, and the new one.
The same comparison is made for the characters before the old and new
locations. The result may be to execute two `point-left' functions
(which may be the same function) and/or two `point-entered' functions
(which may be the same function). The `point-left' functions are
always called before the `point-entered' functions.
A primitive function may examine characters at various positions
without moving point to those positions. Only an actual change in the
value of point runs these hook functions.
File: elisp, Node: Not Intervals, Prev: Special Properties, Up: Text Properties
Why Text Properties are not Intervals
-------------------------------------
Some editors that support adding attributes to text in the buffer do
so by letting the user specify "intervals" within the text, and adding
the properties to the intervals. Those editors permit the user or the
programmer to determine where individual intervals start and end. We
deliberately provided a different sort of interface in Emacs Lisp to
avoid certain paradoxical behavior associated with text modification.
If the actual subdivision into intervals is meaningful, that means
you can distinguish between a buffer that is just one interval with a
certain property, and a buffer containing the same text subdivided into
two intervals, both of which have that property.
Suppose you take the buffer with just one interval and kill part of
the text. The text remaining in the buffer is one interval, and the
copy in the kill ring (and the undo list) becomes a separate interval.
Then if you undo the kill, you get two intervals with the same
properties. Thus, the distinction can't be preserved when editing
happens.
But suppose we "fix" this problem by coalescing the two intervals
when the text is inserted. That works fine if the buffer originally was
a single interval. But if it was two intervals, and the killed text
equals one of them, then undoing the kill yields just one interval.
Again, the distinction can't be preserved.
Insertion of text at the border between intervals also raises
questions that have no satisfactory answer.
However, it is easy to arrange for editing to behave consistently for
questions of the form, "What are the properties of this character?" So
we have decided these are the only questions that make sense; we have
not implemented asking questions about where intervals start or end.
For practical purposes, the property search functions serve in place
of explicit interval boundaries. You can think of them as finding the
boundaries of intervals, assuming that intervals are always coalesced
whenever possible. *Note Property Search::.
Emacs also provides explicit intervals as a presentation feature; see
*Note Overlays::.
File: elisp, Node: Substitution, Next: Underlining, Prev: Text Properties, Up: Text
Substituting for a Character Code
=================================
The following functions replace characters within a specified region
based on their character codes.
- Function: subst-char-in-region START END OLD-CHAR NEW-CHAR &optional
NOUNDO
This function replaces all occurrences of the character OLD-CHAR
with the character NEW-CHAR in the region of the current buffer
defined by START and END.
If NOUNDO is non-`nil', then `subst-char-in-region' does not
record the change for undo and does not mark the buffer as
modified. This feature is useful for changes which are not
considered significant, such as when Outline mode changes visible
lines to invisible lines and vice versa.
`subst-char-in-region' does not move point and returns `nil'.
---------- Buffer: foo ----------
This is the contents of the buffer before.
---------- Buffer: foo ----------
(subst-char-in-region 1 20 ?i ?X)
=> nil
---------- Buffer: foo ----------
ThXs Xs the contents of the buffer before.
---------- Buffer: foo ----------
- Function: translate-region START END TABLE
This function applies a translation table to the characters in the
buffer between positions START and END.
The translation table TABLE is a string; `(aref TABLE OCHAR)'
gives the translated character corresponding to OCHAR. If the
length of TABLE is less than 256, any characters with codes larger
than the length of TABLE are not altered by the translation.
The return value of `translate-region' is the number of characters
which were actually changed by the translation. This does not
count characters which were mapped into themselves in the
translation table.
This function is available in Emacs versions 19 and later.
File: elisp, Node: Underlining, Next: Registers, Prev: Substitution, Up: Text
Underlining
===========
The underlining commands are somewhat obsolete. The
`underline-region' function actually inserts `_^H' before each
appropriate character in the region. This command provides a minimal
text formatting feature that might work on your printer; however, we
recommend instead that you use more powerful text formatting facilities,
such as Texinfo.
- Command: underline-region START END
This function underlines all nonblank characters in the region
defined by START and END. That is, an underscore character and a
backspace character are inserted just before each non-whitespace
character in the region. The backspace characters are intended to
cause overstriking, but in Emacs they display as either `\010' or
`^H', depending on the setting of `ctl-arrow'. There is no way to
see the effect of the overstriking within Emacs. The value is
`nil'.
- Command: ununderline-region START END
This function removes all underlining (overstruck underscores) in
the region defined by START and END. The value is `nil'.
File: elisp, Node: Registers, Next: Change Hooks, Prev: Underlining, Up: Text
Registers
=========
A register is a sort of variable used in Emacs editing that can hold
a marker, a string, a rectangle, a window configuration (of one frame),
or a frame configuration (of all frames). Each register is named by a
single character. All characters, including control and meta characters
(but with the exception of `C-g'), can be used to name registers.
Thus, there are 255 possible registers. A register is designated in
Emacs Lisp by a character which is its name.
The functions in this section return unpredictable values unless
otherwise stated.
- Variable: register-alist
This variable is an alist of elements of the form `(NAME .
cONTENTS)'. Normally, there is one element for each Emacs
register that has been used.
The object NAME is a character (an integer) identifying the
register. The object CONTENTS is a string, marker, or list
representing the register contents. A string represents text
stored in the register. A marker represents a position. A list
represents a rectangle; its elements are strings, one per line of
the rectangle.
- Command: view-register REG
This command displays what is contained in register REG.
- Function: get-register REG
This function returns the contents of the register REG, or `nil'
if it has no contents.
- Function: set-register REG VALUE
This function sets the contents of register REG to VALUE. A
register can be set to any value, but the other register functions
expect only certain data types. The return value is VALUE.
- Command: point-to-register REG
This command stores both the current location of point and the
current buffer in register REG as a marker.
- Command: jump-to-register REG
- Command: register-to-point REG
This command restores the status recorded in register REG.
If REG contains a marker, it moves point to the position stored in
the marker. Since both the buffer and the location within the
buffer are stored by the `point-to-register' function, this
command can switch you to another buffer.
If REG contains a window configuration or a frame configuration.
`jump-to-register' restores that configuration.
- Command: insert-register REG &optional BEFOREP
This command inserts contents of register REG into the current
buffer.
Normally, this command puts point before the inserted text, and the
mark after it. However, if the optional second argument BEFOREP
is non-`nil', it puts the mark before and point after. You can
pass a non-`nil' second argument BEFOREP to this function
interactively by supplying any prefix argument.
If the register contains a rectangle, then the rectangle is
inserted with its upper left corner at point. This means that
text is inserted in the current line and underneath it on
successive lines.
If the register contains something other than saved text (a
string) or a rectangle (a list), currently useless things happen.
This may be changed in the future.
- Command: copy-to-register REG START END &optional DELETE-FLAG
This command copies the region from START to END into register
REG. If DELETE-FLAG is non-`nil', it deletes the region from the
buffer after copying it into the register.
- Command: prepend-to-register REG START END &optional DELETE-FLAG
This command prepends the region from START to END into register
REG. If DELETE-FLAG is non-`nil', it deletes the region from the
buffer after copying it to the register.
- Command: append-to-register REG START END &optional DELETE-FLAG
This command appends the region from START to END to the text
already in register REG. If DELETE-FLAG is non-`nil', it deletes
the region from the buffer after copying it to the register.
- Command: copy-rectangle-to-register REG START END &optional
DELETE-FLAG
This command copies a rectangular region from START to END into
register REG. If DELETE-FLAG is non-`nil', it deletes the region
from the buffer after copying it to the register.
- Command: window-configuration-to-register REG
This function stores the window configuration of the selected
frame in register REG.
- Command: frame-configuration-to-register REG
This function stores the current frame configuration in register
REG.
File: elisp, Node: Change Hooks, Prev: Registers, Up: Text
Change Hooks
============
These hook variables let you arrange to take notice of all changes in
all buffers (or in a particular buffer, if you make them buffer-local).
See also *Note Special Properties::, for how to detect changes to
specific parts of the text.
- Variable: before-change-function
If this variable is non-`nil', then it should be a function; the
function is called before any buffer modification. Its arguments
are the beginning and end of the region that is going to change,
represented as integers. The buffer that's about to change is
always the current buffer.
- Variable: after-change-function
If this variable is non-`nil', then it should be a function; the
function is called after any buffer modification. It receives
three arguments: the beginning and end of the region just changed,
and the length of the text that existed before the change. (To
get the current length, subtract the region beginning from the
region end.) All three arguments are integers. The buffer that's
about to change is always the current buffer.
Both of these variables are temporarily bound to `nil' during the
time that either of these hooks is running. This means that if one of
these functions changes the buffer, that change won't run these
functions. If you do want the hook function to be run recursively,
write your hook functions to bind these variables back to their usual
values.
- Variable: first-change-hook
This variable is a normal hook; its hook functions are run using
`run-hooks' whenever a buffer is changed that was previously in
the unmodified state.
The variables described in this section are meaningful only starting
with Emacs version 19.
File: elisp, Node: Searching and Matching, Next: Syntax Tables, Prev: Text, Up: Top
Searching and Matching
**********************
GNU Emacs provides two ways to search through a buffer for specified
text: exact string searches and regular expression searches. After a
regular expression search, you can identify the text matched by parts of
the regular expression by examining the "match data".
* Menu:
* String Search:: Search for an exact match.
* Regular Expressions:: Describing classes of strings.
* Regexp Search:: Searching for a match for a regexp.
* Replacement:: Internals of `query-replace'.
* Match Data:: Finding out which part of the text matched
various parts of a regexp, after regexp search.
* Standard Regexps:: Useful regexps for finding sentences, pages,...
* Searching and Case:: Case-independent or case-significant searching.
File: elisp, Node: String Search, Next: Regular Expressions, Up: Searching and Matching
Searching for Strings
=====================
These are the primitive functions for searching through the text in a
buffer. They are meant for use in programs, but you may call them
interactively. If you do so, they prompt for the search string; LIMIT
and NOERROR are set to `nil', and REPEAT is set to 1.
- Command: search-forward STRING &optional LIMIT NOERROR REPEAT
This function searches forward from point for an exact match for
STRING. If successful, it sets point to the end of the occurrence
found, and returns the new value of point. If no match is found,
the value and side effects depend on NOERROR (see below).
In the following example, point is positioned at the beginning of
the line. Then `(search-forward "fox")' is evaluated in the
minibuffer and point is left after the last letter of `fox':
---------- Buffer: foo ----------
-!-The quick brown fox jumped over the lazy dog.
---------- Buffer: foo ----------
(search-forward "fox")
=> t
---------- Buffer: foo ----------
The quick brown fox-!- jumped over the lazy dog.
---------- Buffer: foo ----------
The argument LIMIT specifies the upper bound to the search. (It
must be a position in the current buffer.) No match extending
after that position is accepted. If LIMIT is omitted or `nil', it
defaults to the end of the accessible portion of the buffer.
What happens when the search fails depends on the value of
NOERROR. If NOERROR is `nil', a `search-failed' error is
signaled. If NOERROR is `t', `search-forward' returns `nil' and
does nothing. If NOERROR is neither `nil' nor `t', then
`search-forward' moves point to the upper bound and returns `nil'.
(It would be more consistent now to return the new position of
point in that case, but some programs may depend on a value of
`nil'.)
If REPEAT is non-`nil', then the search is repeated that many
times. Point is positioned at the end of the last match.
- Command: search-backward STRING &optional LIMIT NOERROR REPEAT
This function searches backward from point for STRING. It is just
like `search-forward' except that it searches backwards and leaves
point at the beginning of the match.
- Command: word-search-forward STRING &optional LIMIT NOERROR REPEAT
This function searches forward from point for a "word" match for
STRING. If it finds a match, it sets point to the end of the
match found, and returns the new value of point.
A word search differs from a simple string search in that a word
search *requires* that the words it searches for are present as
entire words (searching for the word `ball' does not match the word
`balls'), and punctuation and spacing are ignored (searching for
`ball boy' does match `ball. Boy!').
In this example, point is first placed at the beginning of the
buffer; the search leaves it between the `y' and the `!'.
---------- Buffer: foo ----------
-!-He said "Please! Find
the ball boy!"
---------- Buffer: foo ----------
(word-search-forward "Please find the ball, boy.")
=> t
---------- Buffer: foo ----------
He said "Please! Find
the ball boy-!-!"
---------- Buffer: foo ----------
If LIMIT is non-`nil' (it must be a position in the current
buffer), then it is the upper bound to the search. The match
found must not extend after that position.
If NOERROR is `t', then `word-search-forward' returns `nil' when a
search fails, instead of signaling an error. If NOERROR is
neither `nil' nor `t', then `word-search-forward' moves point to
LIMIT (or the end of the buffer) and returns `nil'.
If REPEAT is non-`nil', then the search is repeated that many
times. Point is positioned at the end of the last match.
- Command: word-search-backward STRING &optional LIMIT NOERROR REPEAT
This function searches backward from point for a word match to
STRING. This function is just like `word-search-forward' except
that it searches backward and normally leaves point at the
beginning of the match.
File: elisp, Node: Regular Expressions, Next: Regexp Search, Prev: String Search, Up: Searching and Matching
Regular Expressions
===================
A "regular expression" ("regexp", for short) is a pattern that
denotes a (possibly infinite) set of strings. Searching for matches for
a regexp is a very powerful operation. This section explains how to
write regexps; the following section says how to search for them.
* Menu:
* Syntax of Regexps:: Rules for writing regular expressions.
* Regexp Example:: Illustrates regular expression syntax.
File: elisp, Node: Syntax of Regexps, Next: Regexp Example, Up: Regular Expressions
Syntax of Regular Expressions
-----------------------------
Regular expressions have a syntax in which a few characters are
special constructs and the rest are "ordinary". An ordinary character
is a simple regular expression which matches that character and nothing
else. The special characters are `$', `^', `.', `*', `+', `?', `[',
`]' and `\'; no new special characters will be defined in the future.
Any other character appearing in a regular expression is ordinary,
unless a `\' precedes it.
For example, `f' is not a special character, so it is ordinary, and
therefore `f' is a regular expression that matches the string `f' and
no other string. (It does *not* match the string `ff'.) Likewise, `o'
is a regular expression that matches only `o'.
Any two regular expressions A and B can be concatenated. The result
is a regular expression which matches a string if A matches some amount
of the beginning of that string and B matches the rest of the string.
As a simple example, we can concatenate the regular expressions `f'
and `o' to get the regular expression `fo', which matches only the
string `fo'. Still trivial. To do something more powerful, you need
to use one of the special characters. Here is a list of them:
`. (Period)'
is a special character that matches any single character except a
newline. Using concatenation, we can make regular expressions
like `a.b' which matches any three-character string which begins
with `a' and ends with `b'.
is not a construct by itself; it is a suffix that means the
preceding regular expression is to be repeated as many times as
possible. In `fo*', the `*' applies to the `o', so `fo*' matches
one `f' followed by any number of `o's. The case of zero `o's is
allowed: `fo*' does match `f'.
`*' always applies to the *smallest* possible preceding
expression. Thus, `fo*' has a repeating `o', not a repeating `fo'.
The matcher processes a `*' construct by matching, immediately, as
many repetitions as can be found. Then it continues with the rest
of the pattern. If that fails, backtracking occurs, discarding
some of the matches of the `*'-modified construct in case that
makes it possible to match the rest of the pattern. For example,
matching `ca*ar' against the string `caaar', the `a*' first tries
to match all three `a's; but the rest of the pattern is `ar' and
there is only `r' left to match, so this try fails. The next
alternative is for `a*' to match only two `a's. With this choice,
the rest of the regexp matches successfully.
is a suffix character similar to `*' except that it must match the
preceding expression at least once. So, for example, `ca+r' will
match the strings `car' and `caaaar' but not the string `cr',
whereas `ca*r' would match all three strings.
is a suffix character similar to `*' except that it can match the
preceding expression either once or not at all. For example,
`ca?r' will match `car' or `cr'; nothing else.
`[ ... ]'
`[' begins a "character set", which is terminated by a `]'. In
the simplest case, the characters between the two form the set.
Thus, `[ad]' matches either one `a' or one `d', and `[ad]*'
matches any string composed of just `a's and `d's (including the
empty string), from which it follows that `c[ad]*r' matches `cr',
`car', `cdr', `caddaar', etc.
Character ranges can also be included in a character set, by
writing two characters with a `-' between them. Thus, `[a-z]'
matches any lower case letter. Ranges may be intermixed freely
with individual characters, as in `[a-z$%.]', which matches any
lower case letter or `$', `%' or a period.
Note that the usual special characters are not special any more
inside a character set. A completely different set of special
characters exists inside character sets: `]', `-' and `^'.
To include a `]' in a character set, make it the first character.
For example, `[]a]' matches `]' or `a'. To include a `-', write
`-' as the first or last character in the range.
To include `^', make it other than the first character in the set.
`[^ ... ]'
`[^' begins a "complement character set", which matches any
character except the ones specified. Thus, `[^a-z0-9A-Z]' matches
all characters *except* letters and digits.
`^' is not special in a character set unless it is the first
character. The character following the `^' is treated as if it
were first (thus, `-' and `]' are not special there).
Note that a complement character set can match a newline, unless
newline is mentioned as one of the characters not to match.
is a special character that matches the empty string, but only at
the beginning of a line in the text being matched. Otherwise it
fails to match anything. Thus, `^foo' matches a `foo' which occurs
at the beginning of a line.
When matching a string, `^' matches at the beginning of the string
or after a newline character `\n'.
is similar to `^' but matches only at the end of a line. Thus,
`x+$' matches a string of one `x' or more at the end of a line.
When matching a string, `$' matches at the end of the string or
before a newline character `\n'.
has two functions: it quotes the special characters (including
`\'), and it introduces additional special constructs.
Because `\' quotes special characters, `\$' is a regular
expression which matches only `$', and `\[' is a regular
expression which matches only `[', and so on.
Note that `\' also has special meaning in the read syntax of Lisp
strings (*note String Type::.), and must be quoted with `\'. For
example, the regular expression that matches the `\' character is
`\\'. To write a Lisp string that contains the characters `\\',
Lisp syntax requires you to quote each `\' with another `\'.
Therefore, the read syntax for a regular expression matching `\'
is `"\\\\"'.
*Please note:* for historical compatibility, special characters are
treated as ordinary ones if they are in contexts where their special
meanings make no sense. For example, `*foo' treats `*' as ordinary
since there is no preceding expression on which the `*' can act. It is
poor practice to depend on this behavior; better to quote the special
character anyway, regardless of where it appears.
For the most part, `\' followed by any character matches only that
character. However, there are several exceptions: characters which,
when preceded by `\', are special constructs. Such characters are
always ordinary when encountered on their own. Here is a table of `\'
constructs:
specifies an alternative. Two regular expressions A and B with
`\|' in between form an expression that matches anything that
either A or B matches.
Thus, `foo\|bar' matches either `foo' or `bar' but no other string.
`\|' applies to the largest possible surrounding expressions.
Only a surrounding `\( ... \)' grouping can limit the grouping
power of `\|'.
Full backtracking capability exists to handle multiple uses of
`\|'.
`\( ... \)'
is a grouping construct that serves three purposes:
1. To enclose a set of `\|' alternatives for other operations.
Thus, `\(foo\|bar\)x' matches either `foox' or `barx'.
2. To enclose a complicated expression for a suffix character
such as `*' to operate on. Thus, `ba\(na\)*' matches
`bananana', etc., with any (zero or more) number of `na'
strings.
3. To record a matched substring for future reference.
This last application is not a consequence of the idea of a
parenthetical grouping; it is a separate feature which happens to
be assigned as a second meaning to the same `\( ... \)' construct
because there is no conflict in practice between the two meanings.
Here is an explanation of this feature:
`\DIGIT'
matches the same text which is matched the DIGITth time by a
previous `\( ... \)' construct.
In other words, after the end of a `\( ... \)' construct. the
matcher remembers the beginning and end of the text matched by
that construct. Then, later on in the regular expression, you can
use `\' followed by DIGIT to mean "match the same text matched the
DIGITth time by the `\( ... \)' construct."
The strings matching the first nine `\( ... \)' constructs
appearing in a regular expression are assigned numbers 1 through 9
in the order that the open parentheses appear in the regular
expression. So you can use `\1' through `\9' to refer to the text
matched by the corresponding `\( ... \)' constructs.
For example, `\(.*\)\1' matches any newline-free string that is
composed of two identical halves. The `\(.*\)' matches the first
half, which may be anything, but the `\1' that follows must match
the same exact text.
matches the empty string, provided it is at the beginning of the
buffer.
matches the empty string, provided it is at the end of the buffer.
matches the empty string, provided it is at point.
matches the empty string, provided it is at the beginning or end
of a word. Thus, `\bfoo\b' matches any occurrence of `foo' as a
separate word. `\bballs?\b' matches `ball' or `balls' as a
separate word.
matches the empty string, provided it is *not* at the beginning or
end of a word.
matches the empty string, provided it is at the beginning of a
word.
matches the empty string, provided it is at the end of a word.
matches any word-constituent character. The editor syntax table
determines which characters these are. *Note Syntax Tables::.
matches any character that is not a word-constituent.
`\sCODE'
matches any character whose syntax is CODE. Here CODE is a
character which represents a syntax code: thus, `w' for word
constituent, `-' for whitespace, `(' for open parenthesis, etc.
*Note Syntax Tables::, for a list of the codes.
`\SCODE'
matches any character whose syntax is not CODE.
Not every string is a valid regular expression. For example, any
string with unbalanced square brackets is invalid, and so is a string
that ends with a single `\'. If an invalid regular expression is
passed to any of the search functions, an `invalid-regexp' error is
signaled.
- Function: regexp-quote STRING
This function returns a regular expression string which matches
exactly STRING and nothing else. This allows you to request an
exact string match when calling a function that wants a regular
expression.
(regexp-quote "^The cat$")
=> "\\^The cat\\$"
One use of `regexp-quote' is to combine an exact string match with
context described as a regular expression. For example, this
searches for the string which is the value of `string', surrounded
by whitespace:
(re-search-forward
(concat "\\s " (regexp-quote string) "\\s "))
File: elisp, Node: Regexp Example, Prev: Syntax of Regexps, Up: Regular Expressions
Complex Regexp Example
----------------------
Here is a complicated regexp, used by Emacs to recognize the end of a
sentence together with any whitespace that follows. It is the value of
the variable `sentence-end'.
First, we show the regexp as a string in Lisp syntax to enable you to
distinguish the spaces from the tab characters. The string constant
begins and ends with a double-quote. `\"' stands for a double-quote as
part of the string, `\\' for a backslash as part of the string, `\t'
for a tab and `\n' for a newline.
"[.?!][]\"')}]*\\($\\|\t\\| \\)[ \t\n]*"
In contrast, if you evaluate the variable `sentence-end', you will
see the following:
sentence-end
=>
"[.?!][]\"')}]*\\($\\| \\| \\)[
]*"
In this case, the tab and carriage return are the actual characters.
This regular expression contains four parts in succession and can be
deciphered as follows:
`[.?!]'
The first part of the pattern consists of three characters, a
period, a question mark and an exclamation mark, within square
brackets. The match must begin with one of these three characters.
`[]\"')}]*'
The second part of the pattern matches any closing braces and
quotation marks, zero or more of them, that may follow the period,
question mark or exclamation mark. The `\"' is Lisp syntax for a
double-quote in a string. The `*' at the end indicates that the
immediately preceding regular expression (a character set, in this
case) may be repeated zero or more times.
`\\($\\|\t\\| \\)'
The third part of the pattern matches the whitespace that follows
the end of a sentence: the end of a line, or a tab, or two spaces.
The double backslashes are needed to prevent Emacs from reading
the parentheses and vertical bars as part of the search pattern;
the parentheses are used to mark the group and the vertical bars
are used to indicated that the patterns to either side of them are
alternatives. The dollar sign is used to match the end of a line.
The tab character is written using `\t' and the two spaces are
written as themselves.
`[ \t\n]*'
Finally, the last part of the pattern indicates that the end of
the line or the whitespace following the period, question mark or
exclamation mark may, but need not, be followed by additional
whitespace.